Search CORE

Archivio della Ricerca - Università di Salerno

ART

Archivio della ricerca- Università di Roma La Sapienza

GWIDD: Genome-wide protein docking database

Author: Alfarano
Aloy
Aloy
Altschul
Fleming
Gunther
Huang
Ilya A. Vakser
Katchalski-Katzir
Kittichotirat
Kundrotas
Kundrotas
Launay
Lensink
Lu
Pagel
Petras J. Kundrotas
Petrey
Russell
Salwinski
Tarcea
Tovchigrechko
Vakser
Vakser
Xenarios
Zanzoni
Zhengwei Zhu
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Structural information on interacting proteins is important for understanding life processes at the molecular level. Genome-wide docking database is an integrated resource for structural studies of protein–protein interactions on the genome scale, which combines the available experimental data with models obtained by docking techniques. Current database version (August 2009) contains 25 559 experimental and modeled 3D structures for 771 organisms spanned over the entire universe of life from viruses to humans. Data are organized in a relational database with user-friendly search interface allowing exploration of the database content by a number of parameters. Search results can be interactively previewed and downloaded as PDB-formatted files, along with the information relevant to the specified interactions. The resource is freely available at http://gwidd.bioinformatics.ku.edu

Computation of significance scores of unweighted Gene Set Enrichment Analyses

Author: A Subramanian
A Zanzoni
Andreas Keller
C Backes
C Backes
Christina Backes
E Rubin
H Hermjakob
H Lee
Hans-Peter Lenhof
J Küntzer
J Lamb
L Salwinski
M Kanehisa
M Krull
S Kim
S Peri
S Wachi
T Barrett
TGO Consortium
V Matys
V Mootha
Y Benjamini
Y Hochberg
Z Jiang
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Gene Set Enrichment Analysis (GSEA) is a computational method for the statistical evaluation of sorted lists of genes or proteins. Originally GSEA was developed for interpreting microarray gene expression data, but it can be applied to any sorted list of genes. Given the gene list and an arbitrary biological category, GSEA evaluates whether the genes of the considered category are randomly distributed or accumulated on top or bottom of the list. Usually, significance scores (p-values) of GSEA are computed by nonparametric permutation tests, a time consuming procedure that yields only estimates of the p-values. Results We present a novel dynamic programming algorithm for calculating exact significance values of unweighted Gene Set Enrichment Analyses. Our algorithm avoids typical problems of nonparametric permutation tests, as varying findings in different runs caused by the random sampling procedure. Another advantage of the presented dynamic programming algorithm is its runtime and memory efficiency. To test our algorithm, we applied it not only to simulated data sets, but additionally evaluated expression profiles of squamous cell lung cancer tissue and autologous unaffected tissue.</p

Directory of Open Access Journals

Walk-weighted subsequence kernels for protein-protein interaction extraction

Author: A Airola
A Bairoch
A Culotta
A Moschitti
A Zanzoni
B Boeckmann
C Giuliano
C Hsu
D Sleator
G Zhou
GD Bader
H Lodhi
J Hakenberg
J Kim
J Shawe-Taylor
Jihoon Yang
Juntae Yoon
K Fundel
M Huang
M Lease
M Miwa
M Miwa
R Bunescu
R Sætre
S Aubin
S Pyysalo
S Riedel
Seog Park
Seonho Kim
SH Kim
SM Harabagiu
T Ono
TH Cormen
Y Miyao
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Lists2Networks: Integrated analysis of gene/protein lists

Author: A Hamosh
A Lachmann
A Ma'ayan
A Ma'ayan
A Ma'ayan
A Ma'ayan
A Subramanian
A Zanzoni
Alexander Lachmann
Avi Ma'ayan
B Zhang
C Stark
D Nam
D Van Hoof
DS Wishart
DW Huang
G Dennis
GD Bader
Gene Ontology Consortium
GR Mishra
H Hermjakob
H Ogata
H Ogata
I Xenarios
J Wang
J-F Rual
LM Brill
M Masseroli
P Shannon
R Lu
S Doniger
S Griffiths-Jones
SG Grant
SI Berger
T Beuming
T Hulsen
T Obayashi
U Stelzl
V Cordeddu
VK Mootha
W Fury
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Analysis of protein sequence and interaction data for candidate disease gene prediction

Author: Adie
Altschul
Ashburner
Badano
Badano
Bader
Bader
Bandyopadhyay
Bateman
Benson
Brown
Diane Fatkin
Franke
Freudenberg
Gandhi
George
George
Hamosh
Ingham
Jason Y. Liu
Jimenez-Sanchez
Jones
Kanehisa
Kelso
Lina L. Feng
McCarthy
Merridee A. Wouters
Mulder
Oti
Pearson
Perez-Iratxeta
Perez-Iratxeta
Peri
Ramani
Richard A. George
Robert J. Bryson-Richardson
Rual
Rudd
Smith
Smyth
Stelzl
Tiffin
Tiffin
Turner
van Driel
von Mering
Wheeler
Zanzoni
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2006
Field of study

Linkage analysis is a successful procedure to associate diseases with specific genomic regions. These regions are often large, containing hundreds of genes, which make experimental methods employed to identify the disease gene arduous and expensive. We present two methods to prioritize candidates for further experimental study: Common Pathway Scanning (CPS) and Common Module Profiling (CMP). CPS is based on the assumption that common phenotypes are associated with dysfunction in proteins that participate in the same complex or pathway. CPS applies network data derived from protein–protein interaction (PPI) and pathway databases to identify relationships between genes. CMP identifies likely candidates using a domain-dependent sequence similarity approach, based on the hypothesis that disruption of genes of similar function will lead to the same phenotype. Both algorithms use two forms of input data: known disease genes or multiple disease loci. When using known disease genes as input, our combined methods have a sensitivity of 0.52 and a specificity of 0.97 and reduce the candidate list by 13-fold. Using multiple loci, our methods successfully identify disease genes for all benchmark diseases with a sensitivity of 0.84 and a specificity of 0.63. Our combined approach prioritizes good candidates and will accelerate the disease gene discovery process

CiteSeerX

Deakin Research Online

Exploiting likely-positive and unlabeled data to improve the identification of protein-protein interaction articles

Author: A Yakushiji
A Zanzoni
AR Mendelsohn
B Liu
BJ Breitkreutz
EM Marcotte
GD Bader
Hong-Jie Dai
Hsi-Chuan Hung
I Xenarios
J Thomas
JA Hanley
JM Temkin
LM Manevitz
M Krallinger
M Lan
N Cristianini
Richard Tzong-Han Tsai
S Fields
S Fujita
S Peri
S Robertson
T Joachims
T Ono
U Güldener
Wen-Lian Hsu
Y Hao
Yi-Wen Lin
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Experimentally verified protein-protein interactions (PPI) cannot be easily retrieved by researchers unless they are stored in PPI databases. The curation of such databases can be made faster by ranking newly-published articles' relevance to PPI, a task which we approach here by designing a machine-learning-based PPI classifier. All classifiers require labeled data, and the more labeled data available, the more reliable they become. Although many PPI databases with large numbers of labeled articles are available, incorporating these databases into the base training data may actually reduce classification performance since the supplementary databases may not annotate exactly the same PPI types as the base training data. Our first goal in this paper is to find a method of selecting likely positive data from such supplementary databases. Only extracting likely positive data, however, will bias the classification model unless sufficient negative data is also added. Unfortunately, negative data is very hard to obtain because there are no resources that compile such information. Therefore, our second aim is to select such negative data from unlabeled PubMed data. Thirdly, we explore how to exploit these likely positive and negative data. And lastly, we look at the somewhat unrelated question of which term-weighting scheme is most effective for identifying PPI-related articles. Results To evaluate the performance of our PPI text classifier, we conducted experiments based on the BioCreAtIvE-II IAS dataset. Our results show that adding likely-labeled data generally increases AUC by 3~6%, indicating better ranking ability. Our experiments also show that our newly-proposed term-weighting scheme has the highest AUC among all common weighting schemes. Our final model achieves an F-measure and AUC 2.9% and 5.0% higher than those of the top-ranking system in the IAS challenge. Conclusion Our experiments demonstrate the effectiveness of integrating unlabeled and likely labeled data to augment a PPI text classification system. Our mixed model is suitable for ranking purposes whereas our hierarchical model is better for filtering. In addition, our results indicate that supervised weighting schemes outperform unsupervised ones. Our newly-proposed weighting scheme, TFBRF, which considers documents that do not contain the target word, avoids some of the biases found in traditional weighting schemes. Our experiment results show TFBRF to be the most effective among several other top weighting schemes.</p

Directory of Open Access Journals

SIDEKICK: Genomic data driven analysis and decision-making framework

Author: A Rowe
A Zanzoni
AP Dempster
AP Dempster
C Alfarano
C Stark
G Bindea
G Joshi-Tope
H Hermjakob
H Parkinson
H Ramos
HB Fraser
J Goodman
JC Bare
JD Han
Kay A Robbins
Kihoon Yoon
L Salwinski
M Castellano
M Doderer
M Jayapandian
M Reich
Mark S Doderer
P Pagel
PT Shannon
S Grossmann
S Mathivanan
S Matos
S Peri
S Pounds
SN Goodman
T Beuming
T Cover
U Stelzl
Z Du
Publication venue: BioMed Central
Publication date: 01/12/2010
Field of study

Abstract Background Scientists striving to unlock mysteries within complex biological systems face myriad barriers in effectively integrating available information to enhance their understanding. While experimental techniques and available data sources are rapidly evolving, useful information is dispersed across a variety of sources, and sources of the same information often do not use the same format or nomenclature. To harness these expanding resources, scientists need tools that bridge nomenclature differences and allow them to integrate, organize, and evaluate the quality of information without extensive computation. Results Sidekick, a genomic data driven analysis and decision making framework, is a web-based tool that provides a user-friendly intuitive solution to the problem of information inaccessibility. Sidekick enables scientists without training in computation and data management to pursue answers to research questions like "What are the mechanisms for disease X" or "Does the set of genes associated with disease X also influence other diseases." Sidekick enables the process of combining heterogeneous data, finding and maintaining the most up-to-date data, evaluating data sources, quantifying confidence in results based on evidence, and managing the multi-step research tasks needed to answer these questions. We demonstrate Sidekick's effectiveness by showing how to accomplish a complex published analysis in a fraction of the original time with no computational effort using Sidekick. Conclusions Sidekick is an easy-to-use web-based tool that organizes and facilitates complex genomic research, allowing scientists to explore genomic relationships and formulate hypotheses without computational effort. Possible analysis steps include gene list discovery, gene-pair list discovery, various enrichments for both types of lists, and convenient list manipulation. Further, Sidekick's ability to characterize pairs of genes offers new ways to approach genomic analysis that traditional single gene lists do not, particularly in areas such as interaction discovery.</p

Directory of Open Access Journals

ChemProt: a disease chemical biology database

Author: A. Bora
Bader
Camon
Chamba
Chen
D. Edsgard
Durant
F. S. Roque
Guldener
Halden
Hamosh
Hermjakob
Hewett
I. Kouskoumvekaki
Joshi-Tope
K. Audouze
Kanehisa
Keiser
Keiser
Knight
Kuhn
Lage
Mestres
Mestres
Mishra
N. Weinhold
O'Brien
O. Taboureau
Oprea
Pafilis
Ponten
R. Curpan
Roth
Rual
S. Brunak
S. K. Nielsen
Safran
Salwinski
Stark
T. I. Oprea
T. S. Jensen
Weill
Willett
Wishart
Y ld r m
Zanzoni
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Systems pharmacology is an emergent area that studies drug action across multiple scales of complexity, from molecular and cellular to tissue and organism levels. There is a critical need to develop network-based approaches to integrate the growing body of chemical biology knowledge with network biology. Here, we report ChemProt, a disease chemical biology database, which is based on a compilation of multiple chemical–protein annotation resources, as well as disease-associated protein–protein interactions (PPIs). We assembled more than 700 000 unique chemicals with biological annotation for 30 578 proteins. We gathered over 2-million chemical–protein interactions, which were integrated in a quality scored human PPI network of 428 429 interactions. The PPI network layer allows for studying disease and tissue specificity through each protein complex. ChemProt can assist in the in silico evaluation of environmental chemicals, natural products and approved drugs, as well as the selection of new compounds based on their activity profile against most known biological targets, including those related to adverse drug events. Results from the disease chemical biology database associate citalopram, an antidepressant, with osteogenesis imperfect and leukemia and bisphenol A, an endocrine disruptor, with certain types of cancer, respectively. The server can be accessed at http://www.cbs.dtu.dk/services/ChemProt/

CiteSeerX